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ABSTRACT 


The design and implementation of an interactive, 
incremental assembly system on an INTEL 8080-based 
microcomputer has been described. Instead of 
requiring separate editing, assembling and debugging 
steps, the system aliows entry , translation and error 
checking simultaneously. The implementation is 
comprised of an integrated set oof modules which 
assemble and execute the source code. The design 
goals, sclutions, and recommendations for further 
expansion of the system have been presented. The 
systen was implemented 3a PLY [fOr use in a 


diskette-based environment. 
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The software development process typically has been a 
sequence of three steps. First the user would create his 
program on some machine readable medium. Secondly he would 
assemble cr compile his source code, going back to step one 
as needed to correct syntax errors until a successful 
assembly cr compilation was achieved. The third step would 
involve dehugaing and running the resuitant object code from 
the assembly or compilation. Of course, if there were any 
program errors detected in the final step the user would 
have to retreat to step one and repeat the entire process 
until he was satisfied that his program would perform in the 


Manner for which it was created. 


During this typical three step process, the earliest 


point at which the user had 


any feedoack from the computer 
as to the correctness of the syntax of his source code was 
memecompleticn of the first attempt at compilaticn or 


assembiy. The user had no idea at all as to how his program 
would execute until a complete assembly or compilation had 


been achieved. 


Assembly System for Interactive Development (ASID) was 
an attempt to demonstrate that the user cauld bagin to 
receive information r=garding his source program at the 
mee practical point. Not only could the user receive 
inmediate feedback concerning the correctness of the syntax 
of each program sentence, but it has been shown that he can 
receive much helpful information concerning the Ilcgical 


construction ot his program at the same time. 
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fee PHYSIOLOGICAL WEAKNESSES OF THE HUMAN BEING 


= 


The designer of an effective man-machine interface must 
be aware of the basic weaknesses of the human physiology as 
it affects the man-machine relationship. The designer 
should also account for the differences in experience levels 


among the user population. 


mire computer, with its capability for exact recall over 
potentially infinite periods of time, should be used to aid 
the user in reconstructing events which he can not recall 
precisely. Another important factor is anxiety in the user. 
The macnine must respond with some sort of signal, either 
audio or visual, to reassure the user that it is working for 
hin. Without this periodic reassurance, the user can becone 


perplexed and lose his train of thought. 


User attitude can also affect the man-machine interface. 
Flexibility in the interface, allowing the experienced user 
Brake shortcuts or providing optional verbosity for the 
inexperienced user, can help to promote a more relaxed 

e 


atmosphere which is conducive to the high level of 


concentration needed to develop and debug programs. 
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B. DEBUGGING TIME 


Once the user has established an interface with the 
machine and begins to use the computer to perform tasks, the 
machine should aid the user during all phases of the 
software development process. If the user determines that 
the tasks civen the machine were not performed properly or 
the machine determines that it does not know what the task 
given to it means, then the computer should respond with a 
set of helpful notices. That is, the machine must aid the 
user in determining why the specific task was not performed 
or why the task was not perfcrmed proverly. Tais function 
is collectively called debugging and separates into two 
wenos. The first portion is concerned with eliminating 
assembly/compile time errors or syntax errors. The computer 
memegenerally guite proficient at detecting and alerting the 


user to misuses of the input language. 


The seccnd part of debugging has to do with run-time or 
logic" errors. These errors manifest themselves as 
unexpected results from execution of the cbject progran. 
Me machine is not proficient at locating errors of this 
type unless the machine or operating system iS put into an 
abnormal state, i.e. attempting to execute a data area. 
Controiled execution monitors (debuggers) are the most 
aeee tive tcols with which to confront logic or run-time 


eLeLOrs. 


Some cf the most common run-time errors that occur are 
@rouped into the following categories: 1) initializaticn, 2) 
addressing, 3)referencing, 4) counting and calculating, 5) 
masking and comparing, 6) estimation of the range of limits, 


Pyeecraering cf code { Ref. 1 7]. 
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Hitt. TE 


INTERACTIVE MAN MACHINE INTERFACE 


If a User 1S to make efficient use of his time while 
utilizing a computer system, the system must be design=d 
with the needs of the user in mind. Perhaps one of the bast 
statements of the recognition that considerable attention 
should be given to the user comes from Dr. James Martin of 


the IBM Systems Research Institute; 


Increasingly..., man must become the prime focus of the 
Zeal destdn. "The computer is there to serve hin, to 
Aaron ıinformat2on for him and to help him do his job. The 
ease with which he communicates with it will determine the 
extent tO Which he uses it. Whether or not he uses it 
powerfully will depend upon the man-machine language 
available to him and how well he is able to underStand 
1t. 


Thus a system should interact with the user. But how 
does one discover the "best" method for designing an 
interactive system? Can the user simply be asked what he 
would like to have happen when he sits down at a terminal? 
Apparently nct, according to several recent writers on the 
subject. Fuss ler, quite contrary to popular opinion, 
"armchair" intuitive design techniques have not provided a 
sufficient basis for system designers to use [ Ref. 2 ]. 
One approach is to allow the interface between the man and 
the machine to be alterable by the user under operating 
conditions without the necessity for reprogramming the 
System [ Ref. 2 jj. Therefore the user interface is 


Originally coded with the capability for making differential 
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responses tc a variety of users under a wide range of 
conditions. This gives tne interactive system interface 
flexibility. The question then becomes now much flexitility 


is required by the user. 


UNO” History 


There are three kinds of interactive high-level 
language systems: 1) ¡intzractive compilation systems, 
2) interactive interpretation systems, 3) interactive direct 
execution systems [ Ref. 3 ]. The general nature of these 


categories of interactive systems is shown in Table I. 


An interactive compilation system is an interactive 
high-level language system in which a compiler is employed. 
Table I lists three types. The type 1(a) system allows the 
source code to be input at a terminal and a text editcr is 
available for making changes and corrections. After the 
entire source is entered, the compiler is called to 
translate the source code into a block of machine code. 
During compilation, the syntax of the source code 1s checked 
and the syntax error messages are later printed out. If 
compilation is successful, the machine code is loaded into 
memory and executed. This entire process is repeated until 
the program is completely debugged of syntax or compilation 
time errers. Note that the level of interaction is limited 
such that the user must submit his entire program to the 


translator before he receives any feedback. 
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TABLE 1 


Types of Interactive High-Level Language Systems 


1. Interactive Compilation Systems 


—_— ee ec se 


Type 1(a}): 


Type 1(b): 


Type 1(c): 


inputing aná text editing the entire source 
code, 

compiling and syntax checking the entir 

source cods, 

executing tne object code. 

inputing and text editing the entire source 


code, 

con piring rhe entire Source code, 

executing the object code. 

inputing and text editing and syntax checking 
each line of source code. 


compiiing the accumulated source code, 


executing the accumulated source code. 


2. Interactive Interpretation Svstens 


= å m: > EEE (um mem m ťa 


Type 2(a): 


Type 2(b): 


"e (u (EEE l ee ål —_=———— a Er ep om a of ee eee See 


inputing and editing the entire source code, 
interpreting and syntax checking the entire 
source code. 

inputing and text editing eacn line of source 
code, 

interpreting and syntax checking a line of 


source code, 


Bee interactive Direct-Execution Systens 


Type 3(a): 


text editing the symbol and the code if in 


error. 
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The type 1(b) system is similar to the type 1(a) 
systen except tnat the type 1(b) employs a syntax, 
interpreter for syntax checking before compilation. In this 
system syntax errors could be detected by the computer and 
Corrected by the user prior to the first attempt at 
compilaticn. In some compilers > syntax checking is 
accomplished as the first pass of the compilation. This is 
an “improvement in the amount of interaction allowed of the 
user because most syntax errors would be found pricr to 
calling in the compiler. The entire program creation 
process should require less time. However, the user has no 
opportunity to interact with regard to syntax error 
Seerection until aíter he has typed in his entire program 


and started the syntax checker. 


The type 1(c) system interacts with the user at the 
level of one lire. As each line of the source is entered, 
Memeo syntax checked and then put into the text file. 
Whenever the ‘user wishes to execute the source code in the 
text file, the source code is then compiled, linked and 
executed. Alternatively, each line of the scurce code could 
be entered, syntax checked and compiled, and then placed in 
the text file. 


An interactive interpretation system IS an 
interactive high level language system in which an 
interpreter is employed. The high-level. language is the 
programming language. There is neither compilaticn nor 
assembly. The user writes only high-level language 


pregrams. 


Two types are shown in Table I. The type 2(a) systen 
allows the source code to be entered on the terminal and a 
text editor is available for making changes and corrections 


as the source code is being typed on the terminal. After 
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the entire scurce code is typed, the interpreter is called 
to check syntax and interpret the source. It is 
conceptually simple, but the unit of interaction with this 


type of system is again the entire source code. 


The type 2 (b) system ls similar to the type 2(a) 
system except that entering, syntax checking and 
interpreting are carried out one line at a time. As a 
result this system provides more interaction between the 
user and the system. 

An interactive direct execution system is an 
interactive high level language system in which a direct 
execution interpreter is employed. As indicated in Table I, 
as each symbol of the source code is being entered at the 
terminal, the symbol is syntax checked and executed. This 
is accomplished by the "interpretive direct execution loop." 
An error message is printed out when an error is detected. 
There would be no "error snowballing" as could happen during 
a compilation run, whereby the compiler erroneously detects 
amors in following statenents that are alg fact 
Ame cticaliy correct. This precess of synbol-by-synbol 
typing, checking and execution gives a maxinum interaction 


between the user and the systen. 


The interactive direct-execution system also assists 
the user in debugging logical errors by showing the partial 
result whenever it is requested. It also allows the user to 
ccmmand the system to execute the accumulated source? code 
from tne beginning and to display the result at specified 
places. When the source code is completely entered, it 
could have already run once and could have been partially 
debugged. As with the interactive interpreter system, the 
user writes only high level language software. The system 
could be designed so that once the source code is debugged, 


it could then be run without any further syntax checking in 





orđãer tO Speed up the execution. It is conceivable that the 
system could be designed so that it serves as a means for 
one to learn the high level language after a minimum amount 


of reađing or instruction. 


Referring to the conventional three-step software 
development process mentioned in the introduction, the 
authors suggest that the syntax checking process and the 
first pass of the assembly or compilation could easily ba 
accomplished concurrently with step one, initial program 
melt. The savings in time prior to completing the first 
successful assembly or compilation should be significant. 
Snowballing or cascading of syntax errors aS is so common in 
many language processors could be ail but eliminated. I£ 
the user were utilizing a dedicated computer hardware systen 
such as one of the increasingly popular microcomputer 
development systems, then the CPU would be asked to perform 
more work in the same time frams than when only a text 


editor is used for program creation. 


The most obvious improvement is that the user has 
continuous assurance that the work he has done is indeed 
m tactically correct. Maman) erzor 1S Gdetected it is 


corrected before proceeding. 


A much more powerful extension of incremental 
processing is to execute each executable segment of the 
program as it is successfully parsed. Obviously some 
restrictions would have to be made on this incremental 
execution. A call to a non-existant subroutine would not be 
allowed. The logic that was to determine whether the 


wou ine was to be called or not could be verified, 
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though. The display of intermediate results at the 
termination of each incremental execution would give the 
user a fairly detailed view of the logical flow of his 
program. All of this is still happening at step one of the 


conventional development cycle. 


32 state Of The Art Approaches 


a Ss ae es = = es == — 
e 


a. CAPS 


An example of an interactive diagnostic 
compiler-interpreter system is CAPS which is in use at the 
University of Illinois at Urbana-Champaign [ Ref. Y J. It 
allows beginning programmers to prepare, debuq and execute 
fairly simple programs at a graphics display terminal. 
Complete syntax checking and most semantic analysis is 
performed as the program is “entered and as it is 
supsequently edited. Analysis by the system is performed 
Character by character. .A ramarkable feature of CAPS is its 
mepeity tO automatically diagnose errors both at compile 
time and at run time. Errors are not automatically 
corrected. Instead, CAPS interacts with the user to help 
hin find the cause of the error. Most of the components of 
CAPS are table driven, both to reduce the space needed for 
implementation and to increase the flexibility of the 
system. CAPS supports the beginning programmer who iS using 


eher Fortran, PL/I or Cobol. 


The principle modules of the CAPS system are a 
wogan editor, a syntactic and static semantic error 
diagnostician, an interpreter for each language supported, a 
run time error analyzer, a user program file manager anda 
system tarle builder and a file manager (Figure 1). Control 


of the system is distributed throughout the modules. The 
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user, however, is never aware? of this modularity and never 
has to remember command syntax because each time the svsten 
is ready for a command the module that will interpret the 


ccmmand displays a menu of possible actions (Figure 2). 


In CAPS, the interactive debugging session is 
directed by the system and not by the user. This is 
essential because the becinning programmer does not know 
„nat questions to ask; he does not know how to debug. An 
added benefit of this is that the user does not have to 


learn a command language for the debugging package. 


Currently, since CAPS uses the Plato IV system 
and has severe time and space constraints imposed on it by 
the Plato IV system, its capabilities are limited, and it 
has been only a qualified success. Over 500 people have used 
Meo While learning rortran and PL/I. The diagnostic 
assistance in’ the interactive environment is clearly 
Superior to any batch system or interactive system for the 
beginning programmer. The problem that causes CAPS to be 
only a qualified success is the time Sharing system in which 
it operates. When Plato IV is handling 500 people 
Simultaneously, even if only 30 terminals run CAPS, the user 
gets frustratingly slow responce - slow, even for the "hunt 


and peck" typist writing in an unfamiliar language. 


T3 
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DEU LnRSBACE TS EMPTY 


ELO... A 
NEXT WETTE A COBOL PROGRAM 
& MEAT MR ITE IN ANUTHER LANGUAGE 
POI POON OF YOUR OLD PROGKAMS 
ERASE ERASE ONE OF YOUR "OLD PROGRAMS 
LAB 2 EU: ONE OF YOUR PROGRAMS 
CAPA SEB AP LISTEOF PROGRAMS ON rite 


E OR STARLING TO WRITE A PROGRAM 


BERRSEACE COUTPINS TEXT 
(russ DATA, FOR DETAILS) 


PRESS... Bo 
NEXT EDIT YOUR WORKSPACE SOME HORE 

LAS EXECUTE YOUR WORKSPACE AS A PROGRAM 
@ zack CLEAR YOUR WORKSPACE 
copy COPY WORKSPACE INTO A FILE 
@ coo! REPLACE 'COBOLK' wITH WORKSPACE 
ERASE ERASE FILE 'COBOLK' FROM THE FILE 


AE TER EDITING "CLD EROGRAUM 'COLBOLK' 


Nimes Typical display of possible actions. 
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Des TBLLDE 


IHLLDE is an example of an interactive direct 


execution system. It acc=pts a subset of Algol 60 (Ref. 3]. 


The svstem configuration of IHLLDE is shown in 
Figure 3. There are seven system units: the monitor, input 
processor, direct execution interpreter, text editor, 
scanner, I/O processor and teletype. The monitor controls 
mmemoperation of all system units directly or indirectly. 
All of the inputs from and the outputs to the teletype are 
handled by the I/O processor. The scanner is called by the 
interpreter only. The monitor operates in four modes: 


monitor mode, input mods, edit mode and run mode. 


The system has been implemented on the Univac 
1108 computer at the University of Maryland. System 
operation can best be described by means of an example 
terminal session.The teletyp= output is shown in Figure U 
where the user's input items are indicated by undersccring. 
The user began his session by typing "I" to the monitor to 
enter the input mode. Then the user proceeded to enter his 
program at the terminal. The user misspelled "INTEGER", and 
the system responded by printing an asterisk under the 
offending symbol tagether with an error messace. The user 
retyped the line starting with the symbol in error. The 
user next entered two "BzAD" statements; the systen 
ztequested the data by showing "DATA?" after each "READ" 
statement, because the "READ" statement had immediately been 
executed. The user next mistyped an assignment statement 
and he, after the system responded, started the correction 
from the symbol in error. The user next mistyped the 
more Statment twice and then corrected it. The corrected 


"WRITE" statement was immediately executed and the value of 
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TIC CUTE PROCESOR 





TEXTI 
ESIL 
TAO "PROCESSOR 
DE A 
re 3- Configuration of an interactive direct-execution 


high-ievel language system. 
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I 


Pie Us 
$BculN 


Me A ee ce ee 


INTEGRE A,E: 


-<er sp e a o e oe — — 


Pees ol NGe INTEGER" 





PROGRAM DONE 





MESSING "z=! E 


BEGIN 
INTEGER A, €; 
READ (B); 
READ (A); 


Au Arte: 
WRITE (A 
3 

END $ 


x 





Figure 4 - Teletype output of an example session with the 


IHLLDE systen. (Underscoring and hoxes added) 
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A outputed. When the user finished his terminal session, 
the system responded with "PROGRAM DONE." Finally the user 
called the text editor and gave the command "L"; a listing 
of his program was printed as shown at the bottom of Figure 
4. 


The IHLLDE system also has been designed and 
implemented for an INTEL MCS 80 microprocessor system, and 
has demonstrated that the microprocessor systems are as well 
Suited for interactive, inexpensive, individual nigh level 
language computer systems, as they are for specific uses 
which require no further programming. It is known that one 
can learn a programming language faster fron an interactive 
system. The experience of the users of the IHLLDE system 


supports this conclusion. 


An on-line system designed for interactive use 
should te equally attractive to both experienced and 
inexperienced users. Because the computer has no methcd for 
evaluating the experience level of the user this information 
would need to be supplied by the user hinseli. The 
implication is that the user should be able to modify the 


man-machine interface during his session with the computer. 


Research in the area of interface flexibili- y 
clearly indicates tnat flo bility is Hot uniformly 
effective with all users in optimizing performance. Ion a 


single encounter with an on-line system, users were more 
prone to make syntax errors if offered short-cut flexibility 
BBEIONS. Nevertheless, almost all users of a flexible 
version of the svsten worked significantly faster than those 
not having the options. The exceptions were the novices who 


worked more rapidly without the options than with then 
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Met. 2 3. 


The authors’ ¿“experiences with the Cambridge 
Monitoring System (CMS) seems to support this conclusion. 
As a higher level of proficiency in programming, handling 


the editor and monitor commands and in keyboard entry itself 


were acquired, previously unannoying responses becane 
increasingly bothersome. For example the "Ready" message 
response after execution of each CMS conmand became 


bothersome, especially when the terminal was communicating 
with the computer at 110 baud. CMS allows the user to turn 


this message off. 


To efficiently cater to the general population of 
users the man-machine interface should be designed sc that 
the user would be abie to shape the detaiis of the interface 


Porenis Own convenience. 


EeeeeoUPPORT OF STRUCTURED PROGRAMMING 


In order to meet the needs of the user and decrease the 
time to develop reliable, efficient software a system should 
woni y be interactive and flexible but it should also 


Sport the use of structured programming. 


Structured programming is a technique that embraces the 
goals Of reliability, maintainability and flexibility in 
software design and implementation { Refs. 5 and 6  ]. In 
the initial design stages, structured programming (SP) 
Begins in the fcrm of structured flow charts or pseudo 
language macros., These in turn consist of and are 
restricted to a small, well-defined set of program flow 
enrol blocks or control functions. The "function modes" 


M@emcacn control block may in turn be composed of other 
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blocks (see Figure 5). In fact, "top down" strategy starts 
the initial design with one block, and further refines each 
function into other blocks until the lowest level of 
specification is coded. This strategy allows for a maximum 
integration of segments, modules and programs with a minimum 
amount of design time. Fach functional block has one entry 
and one exit point which excludes the overlap of functions 


and increases program reliability. 


Along with reliability is the need for readability and 
ease of debugging. In SP this is enhanced by grouping 
memes” of code (5 to 9 functions limited to 10 to 120 
lines of code) into segments which appear on one to three 
pages of source listing. These segments form a module which 


in turn forms a progran. 


Developnent proceeds top-down and breadth first in SP. 
All segments of one level are developed in a left to right 
process, based on sequential order or complexity, before the 


next level is refined and tested. 
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An extremely useful and powerful tool that allows 
for the support of structured programming (especially if the 
language is assembler) is a macro. A macro is used to 
extend some underlying language - to perform a translation 
from one language to another. In assembly language a macro 
is a means of specifying that a symbol appearing in the code 
field of a statement actually stands for a group of 
instructions [ Ref. 7 J]. The use of macros when the user is 
writing in assembler can substantially ease the user's task 
in the following ways: a) Orten, a small group of 
instructions must be fepeated many times throughout a 
Program with only minor changes for each repetition. Macros 
Can reduce the tedium (and the resultant increased chance 
for error) associated with these operations. b) If an error 
in a macro definition is discovered, the program can be 
corrected by changing the single occurrence ot the 
definition and reconpiling/reassembling. TE the same 
routine had been repeated many times throughout the program 
without using macros, each occurrence would have to be 
located and changed. Thus debugging time is decreased. cC) 
Duplication of effort between proqrammers can be reduced. 
Mace the best and most efficient coding of a particular 
function is discovered, the macro definition can be made 
available to all other programmers. d) New and useful 
instructions can easily be simulated. e) Macros assist in 
program readability and documentation f Ref. 7 ]. 

The user of a microcomputer system must often use 
assembler language as his source language to dc his 
programming. Therefore if the system is to allow for direct 
Support of structured programming it must have a macro 


capability. 
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Another feature which the authors pe lee would 
complement an interactive development system in support of 
structured programming is a "stub handler." A stub handler 
permits the user to define from the keyboard, or from system 
default memory locations, identifiers and lakels which were 
referenced in the source program but which were not 
evaluated or defined. An example would be a junp 
instruction to a label which has not yet appeared in the 
program. At the point in the code generation phase of 
assembly when a reference is made to an undefined identifier 
or label, thé user 1S notified and the terminal keyboard is 
opened Pa es aefinition, allowing code generaticn to 


continue with the user supplied value. 


The utility of this mechanism becomes nore apparent 
when it is realized that the user may now write his prograns 
initially using calls to modules or subroutines which have 
not yet been written. This allows the driver portions of 
the program to assume their final form early on in the 
development process. System default values for undefined 
constants could be zero. For undefined subroutines, a call 
mea 2OCation which contains a return instruction would 
allow execution of the resultant code in nost cases. The 
user is now allowed to convert otherwise non-executable 
compilations into executable form whether the missing 


symbolic definition was intentional or accidental. 
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A. FORMAL GRAMMARS AND PARSING 


A complete specification of a programming language must 
perform at least two functions. First, it must specify the 
syntax of the language. Second, it must specify the 
semantics of the language; that is, what meaning or intent 
Should be attributed to each syntactically correct program 
fener. 2 |]. 


A compiler for a programming language must verify that 
its input obeys the lexical and syntactic conventions of the 
language specification. It must also translate its input 
into an object languag> in a nanner that is consistent with 
the semantic specification of the language. This 
translation is referred to as code generation. In addition, 
ne input contains syntactic errors, the compiler should 
announce their presence and trv to pinpoint their location. 
To help perform these functions every compiler has a device 
within it called a parser. Further discussion of parsers 
reguires a review one some basic definitions. Tne 
development beiow generally follows that of Ano and Johnson 
Meret. 3 7. 


A grammar is used to define a language and to impcse a 
structure on each sentence in the language. A context-free 
grammar can often be used to help specify the syntax orf a 
programming language. e ation, if the grammar is 


designed carefuily, much of the semantics of the language 
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embe related to the rules of the grammar. 


In a cortext-free grammar, two đisjoint sets of symbols 
are used, terminal and nonterminal symbols (sometimes called 
syntactic categories). In the grammar, one nonterminal 


symbol is distinguished as the start symbol. 


A context-free grammar itself consists of a finite set 
of rules called productions. A production has the form 
left-side => right-side, where left-side is a single 
nonterminal symbol and right-side is a string of zero or 
more terminal and/or nonterminal symbols. The arrow 1S 
Simply a special symbol that separates the left and right 


Sides. 


A grammar is a rewriting system. If aAb is a string of 
grammar symbols and A => c, then aAb => acb can be written 
and it can be said that aAb directly derives acb. A 


meamwence Of Strings a,a,.-.. , a such that a. => a. 
0 1 n T=] 1 


Bom 1 <= 1 <= n is a derivation of a fron ar < ITfThat isad 
n n 


is derivable fron Se: For each derivation in a grammar a 
corresponding derivation tree can be constructed. A 


derivation tree is a tree whcse outermost leaves form a set 
of terminal symbols in a granmar, whose root is the start 
symbol, and whose interconnecting nodes forn a set of 
productions cf the grammar. Derivation trees are impcrtant 
because they are associated with the parse of a sentence in 


a grammar. 


The start symbol of a grammar, or any string derivable 
from the start symbol, is a sentential form. A sentential 
form containing only terminal symbols is said to be a 
sentence generated by the grammar. The language generated 


by a grammar G, often denoted by L(G), is the set of 
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sentences generated by G. 


A rightmost derivation is defined to be a derivation in 
which at each step in the derivation of a sentential form 
the rightmost nonterminal in each sentential form is 
rewritten to obtain the next sentential form. Each 
sentential form derived in this manner is called a right 


sentential form. 


If aAw is a right sentential form in which w is a string 
of terminal symbols, and aAw => acw, then c is the handle of 


acer. 


A prefix of ac in the right sentential form acw is said 
to be a viable prefix of the grammar. Restating this 
definition, a viabie prefix of a grammar is any prefix of a 
right sentential form that does not extend past the fright 
end of a handle in that right sentential form. There is 
always some string of grammar symbols that can be appendec 
to the end of a viable prefix to obtain a right sentential 
form. Viable prefixes are important in the construction ol 
left-to-right scanning compilers with good error-detectina 
@mpapilities; as long as the portion of the input that has 
been seen can be derived from a viable prefix, no error has 


Met occurred. 


Frequently, the interest in a grammar is not only in the 
language it generates, but also in the structure it imposes 
on the sentence of the language. This is the case because 
grammatical analysis is closelv connected with other 
processes, such aS compilation and transiation, and the 
translaticn or actions of the other processes are frequently 


defined in terms of the productions of the grammar. 


A parser for a grammar is a device which when presented 


With an input string, attempts to construct a derivation 
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tree that matches the input. If the parser can construct 
such a tree, then it will have verified that the input 
string is a sentence of the language generated by the 
grammar. AS symtactically incorrect, then the 
tree construction process will not succeed and the positions 
at which the process falters can be used to indicate 


possible error locations [ Refs. 8 and 9 }. 


A parser can operate in many different ways. One parser 
that is efficient for a context-free grammar and well suited 
for use in compilers for programming languages is an LR 
Parser { Refs. 8, 9, 25 ). 


An LR parser examines the input string from left to 
right, one symbol at a time. It attempts to construct the 
derivation tree "bottom-up"; i.e. from the leaves of the 
derivaticn tree to the root. An LR parser operates by 
reconstructing the reverse of a rightmost derivation for the 
mpc, This is known as a right parse. An ILR(1) parser 
looks at only the next input symbol before taking an action 


step. 


An LR parser deals with a sequence of partially built 
trees during its tree construction process. This sequence 
of trees is referred to as a forest. The forest is 


memotructed from left to right as the input is read. 

There are four types of parsing actions that an LR 
parser can make; shift, reduce, accept (announce completion 
of parsing), or announce error. 

Tn a shift action, the next input symbol is removed fron 
fae input. A new node labeled by this symbol is added to 


the forest at the right aS a new tree by itself. 


In a reduce action, a production is specified. a 
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meet ion by a production causes a new node to be created 
and labeled and the rightmost n roots in the forest (which 
will have already been labeled) to be made gi reer 
descendants of the new node, which then becomes the 


meememost tree of the forest. 


The parser operates by repeatedly making parsing actions 


ieee eather an accept or error action occurs. - 


In order to completely specify an LR parser for a 
grammar, two tables need to be specified: the parse action 
table which specifies which actions to take (shift, reduce, 
accept, or error) with the input symbol depending upon what 
state the parser is in, and the goto table which specifies 
which state the parser is to be in for the next parse 


action. 


A properly constructed LR {1) parser can parse a large 
class of useful languages called the deterministic 
context-free languages. It has a number of notable 
properties: (1) It reports error as soon as possible 
scanning input from left to right). (2) It parses a string 
a... time proportional to the length of the string. (3) It 
requires no rescanning of Previously scanned input 
(backtracking). (4) The parser can be generated mechanically 
tor a wide class of grammars, including all grammars which 
can be parsed by recursive descent with no backtracking and 


those grammars parsable by operator precedence techniques. 


Pee cLATION OF SYNTAX ERROR DETECTION AND CORRECTION TO 
PARSING 


A properly designed LR parser wiil announce that an 


error has occurred as soon as there is no way to make a 
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valid continuation TO the input already scanned. 
Unfortunately, it is not always easy to decide what the 
parser should do when an error is detected; in general, this 
depends on the environment in which the parser is operating 
[ Ref. 8 J. Any scheme for error recovery must be carefully 
interraced with the lexical analysis and code generation 
phases cf compilation, since these operations typically have 
"side effects" which must be undone before the error can be 
considered corrected. me addition, a conpiler should 
recover gracefuily from each error encountered so that 


subsequent errors can also be detected. 


LR parsers can accommodate a wide variety of error 
recovery stratagems. In oblace of each error entry in each 
State, an error correction routine can be inserted which is 
prepared to take some extraordinary actions to correct the 
error [ Ref. 8 j. Identification of the state frequently 
provides enough context information to allow for the 


construction of sophisticated error recovery routines. 


Certain automatic error recovery/correction actions are 
also possible. In parereular, the automatic error 
correction methods described below can be incorporated 


Mieenin an LR parser. 


IU TOMATIC SYNTAX ERROR DETECTION AND CORRECTION 


A very substantial fraction of the time and effort 
required to develop a program is devoted to the removal of 
errors. Any compiler should, as much as possible, help the 


programmer in this chore { Ref. 10 }. 


Early compilers simply rejected programs as soon as an 


error was detected, vaguely describing the error and where 
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it was discovered. At the present time, many compilers try 
to find as many errors as possible. The term error recovery 
is used to designate the process of determining how to 
continue analyzing a source program when an error is 
detected. 


Several compilers, most notably the compilers for CORrc, 
Mana PL/C, try to "correct" all errors, generate code 
and actually execute the program. The term error correction 
1s used to designate the process which, given an incorrect 
Anan, transforms 1t into a correct one. The "goodness" of 
the process can be measured in some sense by the difference 
between the corrected program and what the programmer 
actualiy meant. Users of error-correcting compilers find it 
Substantially faster and easier to remove errors fron 
programs than with conventional compilers, since, no matter 
how many syntactic errors, they still have a chance to find 


moca or run time errors [ Ref. 10 ]. 


The advantage of error correction over error recovery is 
twcfolá. First, error-correction techniques must be much 
more precise than error recovery in diagnosis of the error; 
therefore they provide the programmer with a Letter 
description cf his errors. Second, minor errors do not stop 
a program from executing, and there is a good chance It wili 


me corrected in the right manner [ Ref. 10 ]. 


Peror recovery and error correction are concerned with 
errors in syntax and in semantics. Logic errors cannot be 
detected and are therefore not subject to automatic 
correction. As for semantic errors, only ad hoc recovery 


techniques exist. Several are described in Gries (Ref. 111. 
Misspelling can lead to syntax or semantic errors. When 


such errors are detected, some compilers try to determine if 


a spelling error actually occurred. The first work on the 
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subject is due to Freeman [ Ref. 12 J. Freeman's algorithn 
estimates the probability that an identifier is the 
misspelling or another. Morgan [ Ref. 13 ] has devised a 
more efficient, but less powerful, method which checks only 
for the following errors: one letter is wrong, one letter is 
missing, an extra letter is inserted or two adjacent 


characters are transposed. 


The synopsis below of the most characteristic methods 
for syntax error recovery is derived from summaries by Levy 
[ Ref. 10 ] and Graham and Rhodes [ Ref. 22 ]. 


McKeeman [ Ref. 14 ] describss an admittedly prinitve 


technique sirilar to techniques used in many bottom up 


parsers. It uses special characteristics of particular 
languages. The compiler writer gives a list of "important" 
Boss, like ";"" and "end." When an error is detected, all 


input symbols are examined and discarded until one is found 
which is in the list. Then the symbols on the top of the 
Stack are successively examined and discarded unti the 
current input symbol can legally follow what remains of the 


stack. 


For simple precedence parsers [ Ref. 15 ], two papers 
cover the prcblem of recovery. In these parsers, errors can 
be detected in one of two cas#s: the incoming input symbol 


is illegal, or the top of the stack does not constitute a 


phrase. 
Wirth [ Ref. 16 ] has a strategy for each case. When 
the incoming symbol is illegal, a list of "insertion 


Symbols" is scanned. If some symbol of the list is legal 
between the top of the stack and the incoming symbol, it is 
inserted. Otherwise the input symbol is stacked. When a 
reduction cannot take place because the topmost symbols of 


the stack do not constitute a parse, a table of errcneous 
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productions is scanned comparing the right parts with the 
Aof the stack. If a match is found, the reduction is 
performed and the analysis can proceed. The choice of the 
appropriate insertion symbols and erroneous productions 
requires a thorough understanding of the analysis algorithn 
on the part of the compiler designer, as well as a subtle 
feeling to anticipate frequent misuse of the Syntax. Wirth 
claims that this method yields quite satisfying results. 
While this technique handles expected errors well, 


unexpected errors can cause trouble. 


Leinus [ Ref. 17 ] approaches the same problem more 
Systematically. The recovery procedure consists of three 
basic steps where the three-step sequence is executed 
repeatedly until recovery is complete: (1) Isolate a 
potential phrase (2) Construct the set of possible 
e uet ions" for the potential phrase (3) Recover by 
selecting one of the nonterminal symbols in the set to 
replace the phrase; if the selection attempt fails, repeat 
frcm step (1). The actual process is complex. Leinus does 
Ma noroughly justify the choice of his algorithm, but it 


has the merit of being systematic. 


Five more heuristic methods exist, none of which make 


use of special features of a particular language. 


Gries's scheme [ Ref. 11 ] works for bottom-up parsers 
Such that an error is detected when the input symbol is 
illegal to rcllow the stack (for bottom-up parses, the stack 
Pemtdins the head of the sentential form). It tries, 
whenever an error is detected, to insert a substring in 
Beont Of the current input symbol, such that the substring 
is legal in the context constituted by some stack symbols at 
the right-hand side and by the input symbol at the left. If 
no such substring exists, the current input symbol is 


discarded and the process repeated with the next input 
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symbol. This technique has been successfully used in an 
intuitive manner for error recovery ina compiler using 
transiticn matrices 

er. 18 |). The problem with this technique is that it 
requires a substantial amount of programming effort for the 
error recovery portion of the compiler. Furthermore, 
although such a method handles the expected errors 


reasonably well, it can fail badly on unanticipated errors. 


Irons [ Ref. 19 } has developed an error-recovery method 
for a top-down parser. ME On down parser Constructs a 
derivation tree starting with the top node, or start symbol 
and successively adás lower branches and nodes. In order to 
w backup, Iron's parsing algorithm constructs several 
candidate syntax trees in parallel. At any step during the 
parse, cne or more trees have been constructed; some 
branches are incomplete. An error is detected when no 
Bee tree can be built further. Then all input symbols 
are successively examined and discarded until one is found 
which is a potential node of some incomplete branch. A 
terminal string is determined such that, if inserted before 
this input symbol, the continuation of the parse would cause 
this symbcl to be correctly linked to the incomplete branch. 
The string is inserted and the parse continues. Irons's 
technique uses much more context than that of Gries because 
the parse is top-down and contextual information is =asy to 


extract from the incomplete trees. 


LaFrance [ Refs. 20 and 21 J describes a recovery 
technique fcr parsers using Floyd Production Language. When 
an error is detected in a state where only one next action 
1S possible, this action is taken. Otherwise, a set of 
intuitive and predetermined cules for transforming the top 
of the stack and a fixed number of subsequent symbols is 
used. Which rule tc apply is determined by comparing the 


actual symbols with the set of symbols which "could legally 
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be there." The process looks for a match according to a 
predetermined set of patterns. With each pattern is 
associated a transformation. For example, if the current 
meee String is abcd and if bacd is legal in the current 
meee xt, then ‘a’ and'b!' are transposed in the input string. 
Thus, this process performs transformations which ares more 
complex than those performed by the methods of Gries and 
Irons. The problem With this approach is that if multiple 
parsing continues for an unbounded number of steps, an 
explosion in space and time ersues. LaFrance bounds the 
Bunt of multiplicity. This iaproves efficiency, but can 


yield insufficient information in some cases. 


Uyy [ Ref. 11 | describes a model for error correction 
LOT formal languages wnich have one-way deterministic 
acceptors. This process makes "local" corrections over 
clusters of errors, using the context around the errors to 
determine the correcticn and to insure that the different 
local corrections performed on the string do not interfere 


with one another. The error-correction process is embedded 


In Ee 20-right recognizers. Picmparsiag Of -correct 
strings is not slowed down by the presence of the 
Æo Correction mechanism. This mechanism uses the 


/meeqnizer bcth to detect errors and to find possible 


@orrecticns. 


Levy has attempted to find a theoretical basis for error 
correction in all deterministic context-free parsing methods 
having the correct prefix property. His method includes a 
backward move on the input to determine the entire left 
context of the error discavery point that could contain the 
error and then parallel parses from the beginning cf the 
left context to pursue all possible minimal distance 
corrections of a fixed bound distance. This method has the 
same problem as LaFrance's method. Levy has proposed some 


heuristics te improve its efficiency. 
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Graham and Rhodes [ Ref. 22 ] describe an error recovery 
method which can be incorporated in any bottom up parser 
which does nct back up. The error recovery routines are 
invoked when a syntax error is detected by the parser. 
dom rol is returned to the parser when the error state has 
been removed. The method attempts to analyze the context in 
which the error occurs. There are two phases in the nethod, 


a condensaticn phase and a correction phase. 


In the condensation phase, an attempt is first made to 
make further reductions on the stack, preceding the point of 
error detection. This attempt is a "backward" move. A 
Forward" move is an attempt to parse the input just beyond 
the point of error detection. The forward move terminates 
either because a second error is detected further cn in the 
input, or more likely, because the only possible next 
Bang action is a reduction involving that part of the 
stack containing the detected error. The forward and 
backward moves are an attempt to summarize the context 
Surrounding the point at which the error was detectad. 

The correction phase considers changes to sequences oÏ 
symbols, rather than isolated changes to simple symbols, so 
that as auch of the context that surrounds the error can be 
efficiently exploited. The quality of recovery can be 
Bed for efficiency in choosing a correction. The idea is 
to change the parsing stack, at the point of error, to a 
right-hand side of a production of the grammar or to cone or 
more prefixes of right-hand sides which "fit in" in the 
sense that they can legitimately occur in the given context. 
In general, there usually is more than one possible change 
that appears locally to correct the error. To increase the 
likelihood that the change really corrects the error and to 
provide helpful diagnostic information to the programmer an 
effort is mađe to choose the "best" correction. This is 


accomplished by determining which of the possible lccally 
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correct changes requires a minimum of symbol by symbol 


Mediatication of the parsing stack. 


The Graham-Rhodes method has been empirically tested 
against the Wirth method andthe method used on the PL/C 
compilers and it appears to be qualitatively better than 
either the Wirth method or the PL/C compiler method. 


Denezcozt and Ullman [| Ref. 23 1, and Smith [ Ref. 24 73 
have studied formal error correction. The papers are highly 
theoreticai, examine only the very specific case where 
errors are just substitution of svmbols, and their 
mechanisms are very complex and time cOnsumine. in 
particular, the time needed to parse a correct string is 


considerably greater than if a usual parser is utiiized. 
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V. ASID DESIGN 


A. PURPOSE 


The user can best be aided by the computer during the 
software development process if the man-machine interface is 
such that the user can obtain immediate results from his 
endeavors. In most systems the user writes his progranm or a 
Significant perticn of a program and then submits his code 
to a language processor to see if it is syntactically 
correct. If the code segment was syntactically correct and 
the code segrent has all the necessary semantic parts to 
make it a program, the user would then attempt execution of 


his code. 


The authors! contention was that the user would be 
better served during the initial entry of a program by 
directing the computer to analyse small portions of his code 
for syntax immediately as each segment is input. If a 
Syntax error is detected, the user is informed immediately 
and may make the necessary changes before continuing. After. 
Syntactic analysis the user's input could be executed. For 
this to be done a segment must be defined as some 
arbitrarily small element of the source language. This 
segment is determined by the grammar which is used to define 
MMS” language in which the user is writing. In the CAPS 
System each character is analysed as it is introduced into 
the input stream. Thus the user is constantly assured that 
his input up to the last segment is some valid prefix of a 


source language progran. 
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A more powerful extension of the immediate parse would 
be immediate execution of the source program as executable 
segments appear in the input stream, with the results of 
each such executicn being displayed to the user. At this 
None the user knows that his code is syntactically correct 
and he has seen a detailed view of its execution to assist 


Din detecting errors in the logic of his progran. 


The computer would now be much more able to assist the 
user in producing a running and logically correct program on 


the first attempt. 


AS a realization of the need previously expressed for a 
software development system which could aid the typical user 
to the fullest extent the authors have designed a support 


progran named ASID. 


ASID accepts INTEL 8020 assembler language as defined by 
the INTEL Ccrporation [ Ref. 7 ] and emits machine code for 
the 8080 Microprocessor. ASID is written in PL/M and runs 
in consonance witn the CP/M operating system [ Ref. 30 ] on 
an 8080 based microcomputer with at least 20 kilobytes of 
memory and an auxiliary storage device capable of random 


access of records of£ stored data. 


B. MAJOR FUNCTIONAL BLOCKS 


eae structure of ASID is that of a two-pass 
assembler as presented in Barron { Ref. 29  ]. Figure 6 
shows the working relationships between the classical 
functions of a two-pass assembler and shows the integration 
of the monitor, stub handler, error moduie, and execution 


Monitor. 
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Figure 6 - ASID system diaqram 
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The desirability of a stub handler was expressed 
earlier. The user may select to toggle the stub handler on 
E OLÉ. If the stub handler toggl2 is off and a reference 
is made to an undefined label or identifier, the run is 
aborted. If the toggle is on then the stub handling module 
SreAsiD is called from the code genérating portion of the 
assembler's second pass when a reference is made to an 
identifier or label which is not defined in the program 
segment currently being processed. When the stub handler 
toggle is on then two sources of definition are available to 
the program. One is selected constant values which are 
Provided by ASID. The other source is the user. The stub 
handler prints the name of the identifier or label on the 


terminal and opens the keyboard for input of a definition. 


The monitor function is responsible for initializing the 
program and allowing the user to change the system toggles 
at any time. This allows the user to amend the amount of 
interaction and the Bypesorz products ASID provides. The 
monitor function also includes the looping mechanisms which 
cause the parsing action to immediately process sach 


sentence as lt appears in the input strean. 


The error handler, when the user has opted for the 
incremental rode of operation, causes immediate notification 
ethe user cf the detection of a syntax error. The token 
(or in the case of a scanner error, the character) which was 
not a valid input is shown to the user and the keyboard is 


opened to accept another attempt ata well-formed input 


Mm 


line. In the noninteractive assembly mode, the entire lin 
in which the error occurred is deleted and ASID execution 
continues to parse and check for syntax but no attempt is 


made to generate executable code. 


The execution monitor is used in incremental mode only 
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and performs a controlled execution of the source program, 
line by line as each line completes a successful parse 
ion. e ae e he wachine as the user has formed it 
is copied and restored at tne beginning and end of each 
instruction execution. The stub handler provides the 


mechanism to patch up forward references. 


eh Operation utilizing a previously created file is 


also possible with or without the incremental feature. 


Those instructions which cause an alteration of the 
program counter or which use register M as an operand must, 
ENS present system, be interpreted rather than actually 
executed as their execution would most likely result in 
Mmeeration of a portion of ASID system code or data area. 
Although this is not true execution of the user's source 
code in the pure sense, it does provide the user with the 
@eportunity to view the results of all register and 
accumulator operations as an aid to detection of logic 


errors. 


At the eonrletien of initial program entry, a 
conventional code generation pass is run on tne user's 


program, creating an executable file. 


SASIC SYSTEM OPERATION 


The dialect of 3080 assembler language wnich LS 
processed by ASID is described in a formal grammar in 
Appendix A . The input stream is scanned for tokens by the 
scanner module which is driven by a state transition matrix. 
The tokens built by the scanner are processed by the parser 
section which operates from a set cf tables generate by the 


meee (|) parser algorithm [ Ref. 25 J. Both the scanner and 
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INS parser are able to detect syntactical errors in the 
Bone, All error recovery action is initiated from the 


parser. 


Most assemblers form the actual machine code words for 
those assemtler mnemonics which have a reaister or two 
registers as operands through some regular matnematicai 
combination of a base value (unique for each instruction 
type) and the numerical value associated with each register 
designaticn. Suehgsregulerzeohnpinations are built into the 


Mnr workings of the CPU. The INTEL 8080 is no exception 


wee thes pattern. EVOL towlng Such a scheme, “MOY M,H" 
Beemueces a "HLT" instruction. Also, "IXI A,_" would produce 
EXT SP, _." The grammar for ASID does not allow the 


parser to recognize such constructions. The parser signals 


a syntax error if they appear in the input strean. 


Four files are maintained by tne ASID system during 
program creation. These files are a source file, a macro 
reference file, an intermediate code file and an executable 
edge file. Although ASID does not incorporate a text 
editor, any files created by an ASID user can be edited by 
meme cP/i Context Editor [ ref. 31 ]. The editing functions 
which are available to the user while creating a progran 
with ASID are those console line editing functions provided 
Perey. Specifically, the user is limited to removing the 


last character or removing the entire line. 


Macros are processed by ASID in line with the source 
program. The reguirement for the user is that he define all 
Macros prior to their use in his progranm. A call to an 
undefined macro will cause the ASID session to abort. 
Presently, macro cails may not be used inside of macro 
definitions. Such use of a macro will cause ASID to enter 
an unđetermined state. There is effectively no limit to the 


number of formal parameters allowed with a macro definition. 
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Formal parameters have "scope" only within their declared 


macro body. 


The user may gain access to the monitoring routines in 
ASID to alter any or all or the system toggles whenever the 
terminal keyboard is open for input. The user establishes 
contact with the monitoring section by typing the attention 
character "!" immediately followed by the toggle number and 
the value to which the toggle 1s to be set. Several toggles 
may be set on the same command line, each toggle reference 
separated from the next by a comma. A semicolon must 


mean nate an attention line. 


Input to ASID is free form. Embedded blanks are 
important as delimiters. Any number of blank spaces is 
treated as if it were just one blank. Comments may be 
freely inserted anywhere in the program text between the 
beginning ard ending comment delimiters "< ... >." Each 
ASID statement must be terminated by a semicolon. Multiple 
Statements may appear on one line, and statements may be 


Some inued to the next line. 


If the user has opted for incremental mode operation of 
ASID, each sentence of source code is scanned, parsed, 
converted tc intermediate code form and executed. This 
execution step may involve the stub handler if the source 
code sentence used an identifier or label which is a fcrward 
reference. This process will continue until the psuedo 
operation code "END" is processed. END causes the entire 
intermediate file to be passed through the code generation 


phase to build a complete executable file. 


Upon completion of incremental execution of a line of 
code, the user may access the copy of the CPU for his 
machine and alter the values of the registers. Access is 


Gained by entering "$" followed by the values the user 
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desires to irsert. Motero: input is Significant, 
inputs must be separated by commas and an input must be 
present fer each register pair. Ordering of inputs is as 
COOS: A and Flags, B and C, D and E, H aná L. All inputs 
must be in hexadecimal radix and the line must terminate 


ween a semicclon. 
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MO ASIN. TMPLEMENTATION 


BASIC ASSEMBLER FUNCTIONS 


T- Scanner 


The scanner is driven by a state transition matrix. 
A state transition matrix requires two entering arguments, 
the current state of the process and the next item of 
information to be processed. These two arguments typically 
correspond to the rows and columns of a two dimensional 
linear array. The data items in the array are used to 
designate a specific action from a list of possible action 


to be executed or accomplished. 


One state must be designated as the initial state 


from which all further processing will take place. Once the 
initial state has been designated the processing can 
proceed. 


Primitives used by the scanner deliver the input 
stream one character at a time. Each character delivered is 
used as the column index into the state transition matrix. 
The row index is determined by the current state of the 
process. The resulting data item extracted from the state 
transition matrix is used as an index into a CASE statement 
which will take one or a combination of several actions. 
These actions include adding the character to a 32 byte 


array uséd as an accumulator, modifying the current state of 
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the process, recognizing and reading through comments, 
recognizing the monitor commands, detecting and reporting 


malformed tokens, and recognizing tokens. 


The scanner has the ability to recognize and deliver 
directly to the parser "Special character" tokens, "string" 
data items and "number" tokens. All numbers are converted 
to binary form before parsing action is initiated. all 
other alphanumeric strings (identifiers) that the scanner 
recognizes are placed in the accumulator. The string in the 
accumulator also has had a hashing value computed. When the 
scanner has determined it has a valid token it delivers the 
token to the parser and resets the current state of the 


scanner process to the initial state. 


The scanner was implemented as a state transition 
machine because the authors felt it would be fast, easily 


modified and easily understood. 


The speed advantage is realized due to the fact that 
miemnext action to perform with a particular input character 
is not determined by a series of conditional tests but 
Baer a straight-forward index into a matrix and a CASE 


statement. 


During the development of ASID the authors noted 
feet ne actions to be taken with a specific character could 
be cnanged easily by just altering the entry in the state 
transition matrix, thus alleviating the need for a major 
change in the code of the scanner module itself. This 
Characteristic saved much development time and allowed 
Serrections to the original logic design to be easily 


izplemented. 
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2. Symbol Table 


The symbol table stores attributes of program 
entities such as identifiers, labels and macro names. The 
information stored in the symbol table is built and 
referenced by the assembler. The symbol table data 
structure is a declared linear array whose first 780 bytes 


we initialized to contain tha reserved word list. 


The user portion of the symbol table is an unordered 
linear list of entries. Individual elements are accessed 
through a chained hash addressing scheme. Each entry in the 
hash table heads a linked list whose printnames all evaluate 
to the same hash address. A zero in the hash table 
indicates no entries exist on that particular chain. During 
references tc the symbol table the accumulator, the global 
variable ACCUM, contains the string of ASCII characters 
Which comprise the token with the first byte being the 
length of the printname. The global variable HASHSSUM is 
set to the sum modulo 128 of the ASCII characters in the 
printname. Entries are chained in the order of their 


occurrence ir the progran. 


Fach entry in the symbol table consists of a 
variable-length vector of four entries. These are 
dai bute, collision pointer, value and printnane. The 
Sst item in the printname field is the length o£ the 
SEIS. The maximum length of a printname in the symbol 
table is eight characters, although 31 characters will be 
accepted by the scanner. When a search of the symbol table 
is being conducted, the first items to be compared are the 
respective length fields. Only if tne lengths are thse same 


will a character by character match be attempted. 
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Macrn names have the head-of-chain pcinter for their 
parameter lists stored in the value field. When a macro 
expansion is being processed, all searches for tokens begin 
with the parameter list chain. If the item is not found 
then the symtol table is entered at the global level and the 
search continues. This method estabiishes local scope for 
macro forral parameters. 


The first 780 bytes of the symbol table contain all 


the assembly lanquage mnemonics. These entries appear in 
the hash list in the same manner as user defined 
identifiers. The reserved word entries contain, in the 


attribute word, information concerning whether this is a 
psuedo operation or an actual assembler instruction, if an 
assembler instruction how many data bytes will follow the 
word of machine code, the token number for that type of 
instruction for information to the parser, and the method 
which must be used to incorporate any register operands into 
the machine code word. The value field of a reserved word 
is one byte and contains the basic value needed to campute 
the actual machine code word corresponding to tnat 
wS truction. The collision pointer field and printnane 
field are identical to user defined items, as all elements 


present in the symbol table are searched in the same manner. 


When designing ASID the authors decided to use a 
declared linear array with subscripts as pointers rather 
than "based variables and pointers" which are also supported 
bye PLN. This choice was made to increase the readability 


of the ASID source program. 


With the reserved word list and the syntbol table 
combined as one data structure, only one set of procedures 
was needed coco nacer as eaten and information 


Se raction functions. 
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3 Parser 


A A m m = ur 


The rarser used by ASID is a table-driven pushdown 
automaton of the LR(1) type described in section IV A. It 
receives a stream of tokens from the scanner and analyzes 
them to determine if they form a sentence of the 8080 
assembler language. The 8080 assembler grammar is designed 
so that each statement parses to a complete sentenc:, 
causing a source program to appear as a series of sentences. 
When an error is detected and ASID is in the incremental 
mode, the parser gives an error message to the user and 
tells him with which token the error was discovered. The 
sentence in error is deleted and the parser reinitialized. 
The user is then allowed to reenter the line which was in 
Peer wlth a correction. In the nonincremental mode the 
line is simply deleted, the parser reinitialized, and the 
next sentence ("program") parsed. The major data structures 


in the parser are the parse tables and the parse stack. 


The reasons an LR(1) parser was used in the ASID 
system were its efficient operation with a context-free 
grammar, its error detecting capabilities,its ability to 
accommodate a wide range of error recovery/correction 
Strategies, and its ability to be generated easily and 
mechanically for the grammar the authors used to describe 
the 8080 assembler language. The one disadvantage of using 
an LP(1) parser with the ASID system is that it requires a 


large amcunt of memory. 


4. Pass 2, Code Generation 


 — e Cu DR O A A A A AA 


Fita ddaition to verifying the syntax of source 


statements, the parser also acts as a transducer by 
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associating semantic actions with reductions. Each time the 
parser determines that a reduction should take place, the 


procedure CODER is called with the production number as a 


parameter. Some productions have no semantic actions 
associated with then. Various global variables contain 
pertinent information concerning the previously farsed 


tokens. This information is used by the various production 
actions to manipulate the symbol table, build an 
intermediate code file and manipulate the macro reference 
file. 


Because macros and the psuedo operation SET and EQU 
are processéd in-line during pass ons, several logical flags 
are needed to coordinate the actions of the parser and the 
CODER procedure. Miesce flags Control the redirection of 


Pogram products and source to and from the macro file. 


If the user has opted for incremental mode, then as 
each sentence of source completes parsing (accept state is 
reached) the intermediate code for that sentence is passed 
to the executable code generator. Procedure SXGEN 
determines whether or not the last sentence was an assembler 
etruction or a psuedo operation. Assembler instructions 
are represented by the appropriate eight bit machine code 
word. If a data value or address is needed to complete an 
instruction, the expression is evaluated and the symbol 
table is consulted as needed. If the symbol table entry 
indicates that this item is undefined (has not been given a 


value) the stub handler is called. 


When the CODER sees the psuedo operation END it 
signals the assembler to reset its input back to the 
beginning of the intermediate code file and pass the entire 
program through EXGEN. The result is the executabie version 
of the user's program. The stub handier is availahle during 


this portion of ASID execution. 
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PAC RO PROCESSING 


To more fully support the user of 8080 assembler 
language, ASID possess a macro processor. The macro 
processor employed by ASID is an in-line` processor, rather 
than a separate preprocessor. This was made possibie by 
including the macro definition and macro call in the formal 
grammar. When the parser is in the process of verifying 
anpur and it discovers a macro definition or macro call then 
certain iogical flags are set to facilitate the redirection 


of the intermediate code generated by procedure CODER. 


When a macro definition is being processed, the 
processinc of the macro body is handled in the same manner 
as non-macro statements with the exception erat the 
intermediate code is directed to the macro reference file, 
and no incremental execution is performed. This redirection 
of intermediate code terminates when the psuedo operation 
ENDM is seen. FNDM closes the macro file for the definition 
being processed and redirects the intermediate code back to 
the intermediate file associated with the main body of the 


progran. 


When a macro call is recognized, tne macro reference 
file is searched to locate the corresponding macro 
definition. If the search is successful, the corresponding 
macro definition, which has already been converted to 
intermediate code form, is copied into the main program 
intermediate code file. If ASID is in incremental mode, 
each sentence of the macro definition will be executed 
incrementally as it is inserted in the main program 
intermediate file. If the macro definition is not found a 


Mital error cccurs and the ASID session is aborted. 
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RN ERROR MODULE 


mae current error module is very crude in the ASID 
systen. It is cailed by the varser when a syntax error is 
detected. Its major function is to undo what has been done 
by the CODER and symbol table manager up to the point in the 
parse where the error was discovered. This consists mainly 
of resetting pointers and flags pack to their proper values 
so that a graceful recovery from tne error can be 
accomplished and ¿further errors can be detected. If the 
system is in the incremental mode, "error snowballing" is 
eliminated as the user is asked to correct the error before 
ancther parse with a different statement is done. AS 
mentioned in the parser section of ASID implementation and 
in section IV B, an error module that is capable of 
automatic error recoverv/correction would certainly be 
feasible and desirable in the ASID systen. The main 
disadvantage of an automatic error Tecovery/correction 
module would be the amount of memory needed to incorrorate 


1t in the systen. 


DO TBCREMENTAL EXECUTÍION MONITOR 


The incremental execution monitor is operative only if 
the user has opted to employ ASID in the incremental mode. 
When enabled, this portion of ASID causes an actual 
execution of the latest line of the user's program. Certain 
instructions, however, cannot be allowed to execute because 
only one line of the user's object code is loaded into the 
execution buffer. If a jump or call instruction were 


actually executed, control of the computer would be lost. 
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At the end of each execution cycle the contents of the 
CPU registers may be displayed at the terminal. The user 
may then check out the logic of his program in some detail 


before entering the next line. 


During incremental execution no attempt is made to 
reserve previously executed instructions or user defined 
data areas. Consequently, instructions which contain 
references to register M invoke the stub handler prior to 
execution. All information stored in the symbol table is 


available during ircremental execution. 


If it is determined that the current instruction can be 
increnentally executed then the machine instruction word and 
associated data byte or bytes areloaded into a four-byte 
array in memory. aA RST 1 instruction is executed, the CPU 
registers are copied into memory and the contents of the 
registers the way the user left them at the end of his last 
execution is placed into the CPU. The program counter is 
next set to the address of the user's instruction. The two 
bytes set aside for user instruction data were initialized 
to zero (NOP instructions). The fourth byte in the user 
eweeeutlon buffer is a RST 2 instruction which invokes a 
routine which copies the user's machine state into memory 


and restores ASID. 


If the user has requested the display of registers then 
ASID causes his machine state to be printed at the terminal. 


ASID is now ready to accept the next line of input. 


It might seem that there have been rather severe 
limitations placed or: which instructions can be 
incrementally executed. The important ening to be 
remembered is that at this point in program creation with 
most other systems, all the user has done is to create a 


portion cf a source file or punch a deck of cards. He has 
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no information whatsoever concerning the correctness of his 
syntax let alone what the program will do when it is 
executed. Using ASID, however, the user is assured of 
correct syntax and has seen the results of controlled 


Be eution Of a portion of nis program. 
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Vit. SOY STEN EVALUATION 


A. AREAS OF APPLICATION 


The authors felt that ASID would have a high degree of 
meemrcability in the educational environment. The time 
taken to learn any programming language is significartly 
reduced when an interactive svstem capable of immediate 
feedback is utilized by the student. When contrasting 
assembly landquage to a higher level language the user of 
assembly language is not bound by detailed rules for 
structuring segments of code. This attribute of assembly 
language creates an environment in which it is difficult for 
the user to continually co-ordinate the individual steps 
involved in his program with the overall desired result. 
ASID allows the user to receive immediate feedback 
concerning the actual results of execution of each 
instruction as it is entered. This enables a student user 
to verify that his instructions are causing the operation 
which he intended. ASID also eliminates such troublesome 


Bierruction constructs as MOV M,M and LXI A, _. 
The experienced user can also benefit from ASID because 


it was designed to support structured programming techniques 


and should reduce initial program entry/assembly tine. 
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Bam ox LENSIONS 


ASID in its presént form is a complete system. There 
certain additicnal features and functions which the 


authors feel would enhance the operation of the systen. 


A "mini" text editor would be advantageous. This would 
allow the user to correct only the token “orch 1s in error 
rather than the present convention of reguiring the entire 
line to be re-entered. Insertion and deletion of tokens 


would also be allowed. 


The incremental execution monitor could be expanded in 
function to include the building of a memory map of the 
user's program as it is being processed in incremental mode. 
This would enable the controlled execution of the presently 
troublesome JUMP, CALL and memory reference instructicns as 
long as the references and labels are defined in the program 
prior to their use as an operand. Forward referencing could 


still be the responsibility of the stub handler. 


The error module could be expanded to include one c£ the 
automatic syntax error correction schemes presented in 
section IV C. The authors specifically recommend the method 
developed by Graham and Rhodes [ Ref. 22 J. This method is 


designed to be easily interfaced with an LR(1) parser. 

The facility for modifying the user's CPU iS somewhat 
cumbersome in its present form. An extentson could be to 
provide selective modification of one or more of tne 


registers. 


Another extension pertains to the macro processor. 
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Specifically, ASID in its present form, does not allow the 
nesting of macro definitions nor does it allow macro calls 
ABD macro definitions. Both of these features would 


greatly enhance the overall utility of the systen. 


Presently ASID does not provide a cross-reference 
listing of user defined variable names, nor does the source 
code file receive an automatic formatting treatment, whereby 
label, instruction and operand fields are aligned using 
preset tab positions. Both of these features would be 
useful to the user. A facility that allows the user to 
specify a library routine aná nave it linked to his program 
is another extension that the authors feel would be of great 


value to the user. 


64 





VIII. SUMMARY AND RECOMMENDATIONS 


ASID was an attempt by the authors to integrate state of 
the art compiler writing techrigues, a highly interactive 
environment and an 8030 based microcomputer mainfrane. The 
resultant systen processes 8080 assenbler language 
incrementally witn each conplete line or sentence of 
assembler code being the unit of interaction. In addition, 
ASID is capatle of performing a controlled execution of each 
instruction after a successful assembly of the line. A 
macro processor was included in ASID which processes macro 
definitions and macro calls in-line on the first pass of the 


assenbly. 


The authors have concluded that: 5 
1) Assembly language for the 8080 can be described by a 
context-free grammar which is recognized and accepted by 
an LALR-1 parser. 
2) Incremental compilation during initial program entry 
on a dedicated hardware system is feasible. Nc undue 
delays cr interruptions were perceived by the user during 
program entry. 
3) Macros can be processed effectively in-line. 


4) Incremental execution is feasible. 


The present form of ASID, although runnable, 1s not 
meecopriate for public distribution. The source for ASID is 
available through the Chairman of the Department of Computer 
Science, Naval Postgraduate School, Monterey, California 
93940. Several areas for further extensions of the system 
have been mentioned and the authors feel that ASID is well 


feted as the basis for further research, particularly in 
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the areas of error recovery and incremental execution. 
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APPENDIX A 


FORMATO RAMMAR for ASID 


Soak [> 
SerewaARKY INSTRUCTION> 


SeeseLeD INSTRUCTION> 


seme C INSTRUCTION> 
Seok SLE MNEUMCNIC> 


ee 


ee 
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SPERIMARTZTNSTRUCETIOND 
STABSLEDZINSTRUCTTON> 
<MACRO DEFINITION> 

<MACRO CALL> 
SBASICZINSERUCTTOND 

<LABEL> 

<LABE£L> <BASIC INSTRUCTION> 


<A SO DB LEER 


<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<MNON 
<LEFT 
<MNON 


“NEDMONTC> 
ZERO> 

UNER? SREGISTERNS? 
ONER>SREGISTENG> 
ONER? <REGISTERDS? 
ONER> <REGISTERH?> 
ONERPƏ<REGISTERPS> 
BNERP>SEEGISTERN> 
BNERP><SREGISTERSP 
ONES SREGTISTERPS> 
SUIESTSSREGISTERH> 
OASIS <REGISTERPSN> 
OUNEAS SEXPRESSION> 
ONEADEFN><DLIST> 
UPRO BRE R DO > ICE 
ONEBRSTD<SEXPRESSION> 
ONELS>SREGISTERPS> 
PART><EXPRESSION> 
TWOUV><REGISTERM> , 


<REGISTER 11> 


<MNON 


TWOMV><REGISTERH> „, 





SRE@ESTERPS? 


<MNON TWOMVD<REGISTERMD , 
SERE SUS PER 


<ENON TWOMV><REGISTERI> , 
< REGISTERM? 


<MNON TWOMV><REGISTERH> , 
Seal San ka 


<MNON TWOMVXREGISTERPS> , 
<REGISTERM> 


<MNON TWOMV><REGISTERI> , 
SHREGISTER] > 


<MNON TWOMV><REGISTERPS> , 
SREGISTERPS> 


SMNON TROMV><REGISTERH> , 
SREGISTERH> 


SNIONFTROMYTZSREGISTERPS?> , 
< REGISTERT? 


SON TWOMY>O<CREGISTERID>D , 
< REGIS TERPS2 


ZAIONFTWOHY>SREGISTERPS> , 
SREGISTERG> 


<MNON TWOMV>XREGISTERH> , 
< R BGISTERPS? 


< YVON PWOMV><REGISTERE? , 
<REGIS LER (> 


<MNONTTWORVOSREGISTERID , 
<REGISTERH> 


SUNOMETAONT><SREGISTERI> , 
SERERESSTON> 


SBIO IETKONT><SREGISTERPS> , 
<SERBRESSEON> 


<MNON TWOMID<REGISTERHD , 
< EXPRESSION 


<<AN METOM T><REGISTERM>? ; 
CEXPRESSIONS 


NOOO L>SREGISTERPS? , 
<EXPRESSION> 


<MNON TWOLI-><REGISTERHA> , 
<EXPRESSION> 


<UNON. TIOLI>CREGISTERSP> , 
SERERSSSLON> 


SIDENTIFTER> EOU 
ZEDENFILITER> SET 
<TDENTTIETIER> MACRO 

SENDE TTETER > HACRO <PLIST> 
<MACROSE> 


N 


SERFT PART> 


EMACRO DEFINITION? : 


<MACRO CALL> : 
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REGISTER I> 


EREGTISTERPS> 


~ 


<REGISTERSP> 
<KREGISTERPSW> 
SBEGISTERR> 
<REGISTERM> 
SDLIST> 
PRESS ICH 


<LOGICAL TERM> 


<LOGICAL FACTOR> 


<LOGICAL FRIMARY> 
emer HNETIC EXPRESSION> 


<TERM> 


SPRIMARY> 
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<MACROG> <PARM LIST> 


O > 


= 5. SO & QQ td E E 


SERDRESSION> 
<LOGICAL TERM> 


<CEXPRESSION> OR 
<LOGICAL TERM> 


SO ES SONO R 
SLOGICAL TERM> 


<LOGICALTFACTOR> 


<LOGICAL TERM> AND 
<LOGICAL FACTOR> 


<LOGICAL PRIMARY> 

NOT <LOGICAL PRIMARY> 
<ARITHMETIC EXPRESSION> 
<TERM> 


Cheon LC BAP RES SIONS + 
STERI? 


ARITHMETIC EXPRESSION? -~ 
< TERA? 


EXTERN > 

+ <TEERM> 

<PRIMARY> 

<TERM> * <PRIMARY> 
<TERM> / <PRIMARY> 
<TERM> % <PRIMARY> 
<TERM> MOD <PRIMARY> 
<TERN> SHL <PRIMARY> 
SIEERN SIR ZSPRTIMARY> 
SEDENLIFTER> 





SELTIST> 


Seen LIST> 


<OPERAND> 


ee 
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<NUMBER> 

( <EXPRESSION> ) 
<STRING> 

<EQU NAME> 

<SET NAME> 

<DEFINED LABEL> 
<IDENTIFIER> 

<PLIST> , <IDENTIFIER> 
<OPERAND> 

<OPERAND> , <PARM LIST> 
<REGISTER1> 
<REGISTERPS> 
<REGISTERH> 
<REGISTERM> 
<REGISTERSP> 
<REGISTERPSW> 
<EXPRESSION> 
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